A Sequential Model for Discourse Segmentation

نویسندگان

  • Hugo Hernault
  • Danushka Bollegala
  • Mitsuru Ishizuka
چکیده

Identifying discourse relations in a text is essential for various tasks in Natural Language Processing, such as automatic text summarization, question-answering, and dialogue generation. The first step of this process is segmenting a text into elementary units. In this paper, we present a novel model of discourse segmentation based on sequential data labeling. Namely, we use Conditional Random Fields to train a discourse segmenter on the RST Discourse Treebank, using a set of lexical and syntactic features. Our system is compared to other statistical and rule-based segmenters, including one based on Support Vector Machines. Experimental results indicate that our sequential model outperforms current state-of-the-art discourse segmenters, with an F-score of 0.94. This performance level is close to the human agreement F-score of 0.98.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Reranking Model for Discourse Segmentation using Subtree Features

This paper presents a discriminative reranking model for the discourse segmentation task, the first step in a discourse parsing system. Our model exploits subtree features to rerank Nbest outputs of a base segmenter, which uses syntactic and lexical features in a CRF framework. Experimental results on the RST Discourse Treebank corpus show that our model outperforms existing discourse segmenter...

متن کامل

Estimation of Discourse Segmentation Labels from Crowd Data

For annotation tasks involving independent judgments, probabilistic models have been used to infer ground truth labels from data where a crowd of many annotators labels the same items. Such models have been shown to produce results superior to taking the majority vote, but have not been applied to sequential data. We present two methods to infer ground truth labels from sequential annotations w...

متن کامل

Looking at discourse in a corpus: The role of lexical cohesion

This paper is aimed at reporting on the development and application of a computer model for discourse analysis through segmentation. Segmentation refers to the principled division of texts into contiguous constituents. Other studies have looked at the application of a number of models to the analysis of discourse by computer. The segmentation procedure developed for the present investigation is...

متن کامل

Does syntax help discourse segmentation? Not so much

Discourse segmentation is the first step in building discourse parsers. Most work on discourse segmentation does not scale to real-world discourse parsing across languages, for two reasons: (i) models rely on constituent trees, and (ii) experiments have relied on gold standard identification of sentence and token boundaries. We therefore investigate to what extent constituents can be replaced w...

متن کامل

Discourse Segmentation by Human and Automated Means

The need to model the relation between discourse structure and linguistic features of utterances is almost universally acknowledged in the literature on discourse. However, there is only weak consensus on what the units of discourse structure are, or the criteria for recognizing and generating them. We present quantitative results of a two-part study using a corpus of spontaneous, narrative mon...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010